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Information theory is a statistical theory dealing with the relative state of detectors and physical 
systems. Because of this physicality of information, the classical framework of Shannon needs to 
be extended to deal with quantum detectors, perhaps moving at relativistic speeds, or even within 
curved space-time. Considerable progress toward such a theory has been achieved in the last fifteen 
years, while much is still not understood. This review recapitulates some milestones along this road, 
and speculates about future ones. 



I. PREFACE: FROM NUCLEI TO QUANTUM 
INFORMATION 

I am sure I am one of the more junior contributors 
to this volume celebrating Gerry Brown's 85th birthday, 
and still I've known him for 25 years. I arrived as a young 
graduate student at Stony Brook University in 1986, and 
Gerry immediately introduced me to every member of 
his Nuclear Theory group, ending with his postdoc Is- 
mail Zahcd. He pointed to a chair in Ismail's office, said: 
"You guys talk", and left. I started to work with Ismail 
that day, and when he was promoted to Assistant Profes- 
sor I became his first graduate student. Gerry and I only 
started to work together closely within the last two years 
of my Ph.D., and the collaboration intensified when he 
took me on his yearly Spring visits to the Kellogg Radia- 
tion Laboratory at the California Institute of Technology. 
There, I had the opportunity to meet Hans Bethe, who 
visited Caltech every Spring to work with Gerry. Over 
the following years, Hans and I became good friends and 
Hans's influence on my growth as a scientist would end 
up rivaling the influence that Gerry had on me pQ. In 
particular Hans was always very interested in my shifting 
interests from nuclear and high energy theory first to- 
wards quantum information theory and the foundations 
of quantum mechanics, and then to theoretical biology. 
At the same time, Gerry and Hans's collaboration on the 
physics of binary stars and in particular black holes con- 
tinued to intrigue me. I ended up staying at Caltech for 
12 years. 

While I spend most of my time now working in biol- 
ogy, I still sometimes return to work in physics. People 
like Gerry and Hans have reinforced to me the fun that 
comes with attempting to understand the universe's ba- 
sic principles, and when lucky enough, unravel a few of 
them. Perhaps it is not a coincidence that one of the 
striking applications of quantum relativistic information 
theory that I describe below is to the physics of black 
holes. Gerry and I discussed black holes and binary stars 
endlessly on walks in the mountains adjacent to Caltech, 
and on the phone (often on Sunday mornings) when he 
was back in New York. Why did I store away article after 
article on black holes in the 1990s when I wasn't nearly 
working on the subject? I am sure it was Gerry's influ- 



ence, who taught me to go after your gut instinct, and 
not worry if you are called crazy. I've been called crazy 
in many a referee's review, and I've come to realize that 
this usually signals that I am on to something. Thus I 
dedicate this article to you Gerry: there are crazy things 
buried in here too. 



II. ENTROPY AND INFORMATION: 
CLASSICAL THEORY 

Since Shannon's historical pair of papers [2], informa- 
tion theory has changed from an engineering discipline to 
a full-fledged theory within physics [3] . While a consider- 
able part of Shannon's theory deals with communication 
channels and codes [1], the concepts of entropy and in- 
formation he introduced are crucial to our understanding 
of the physics of measurement, and turn out to be more 
general than thermodynamical entropy. Thus, informa- 
tion theory represents an important part of statistical 
physics both at equilibrium and away from it. 

In the following, I present an overview of some crucial 
aspects of entropy and information in classical and quan- 
tum physics, with extensions to the special and general 
theory of relativity. While not exhaustive, the treatment 
is at an introductory level, with pointers to the technical 
literature where appropriate. 



A. Entropy 

The concepts of entropy and information are the cen- 
tral constructs of Shannon's theory. They quantify the 
ability of observers to make predictions, in particular how 
well an observer equipped with a specific measurement 
apparatus can make predictions about another physical 
system. Shannon entropies (also known as uncertain- 
ties) are defined for mathematical objects called random 
variables. A random variable X is a mathematical ob- 
ject that can take on a finite number of discrete states 
Xi, where i = l,...,N with probabilities pi. Now, physi- 
cal systems are not mathematical objects, nor are their 
states necessarily discrete. However, if we want to quan- 
tify our uncertainty about the state of a physical system, 
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then in reality we only need to quantify our uncertainty 
about the possible outcomes of a measurement of that 
system. In other words, an observer's maximal uncer- 
tainty about a system is not a property of the system, 
but rather a property of the measurement device with 
which the observer is about to examine the system. For 
example, suppose I am armed with a measurement de- 
vice that is simply a "presence-detector" . Then the max- 
imal uncertainty I have about the physical system under 
consideration is 1 bit, which is the amount of potential 
information I can obtain about that system, given this 
measurement device. 

As a consequence, in information theory the entropy 
of a physical system is undefined if we do not specify the 
device that we are going to use to reduce that entropy. 
A standard example for a random variable (that is also 
a physical system) is the six-sided even die. Usually, the 
maximal entropy attributed to this system is log 2 (6) bits. 
Is this all there is to know about this system? What if we 
are interested not only in the face of the die that is up, 
but also the angle that the die has made with respect 
to due North? Further, since the die is physical, it is 
made of molecules and these can be in different states 
depending on the temperature of the system. Are those 
knowable? What about the state of the atoms making 
up the molecules? All these could conceivably provide 
labels such that the number of states to describe the die 
is in reality much larger. What about the state of the 
nuclei? Or the quarks and gluons inside those? 

This type of thinking makes it clear that we cannot 
speak about the entropy of an isolated system without 
reference to the coarse-graining of states that is implied 
by the choice of detector (but I will comment on the 
continuous variable limit of entropies below). So, even 
though detectors exist that record continuous variables 
(such as, say, a mercury thermometer), each detector has 
a finite resolution such that it is indeed appropriate to 
consider only the discrete version of the Shannon entropy, 
which is given in terms of the probabilities pi as |61j 

N 

H(X) = -£>log Pi . (1) 

i 

For any physical system, how are those probabilities 
obtained? In principle, this can be done both by experi- 
ment and by theory. Once I have defined the N possible 
states of my system by choosing a detector for it, the a 
priori maximal entropy is defined as 

F max = logiV. (2) 

Experiments using my detector can now sharpen my 
knowledge of the system. By tabulating the frequency 
with which each of the N states appears, we can esti- 
mate the probabilities pi. Note, however, that this is a 
biased estimate that approaches the true entropy Eq.Q 
only in the limit of an infinite number of trials [5]. On 



the other hand, some of the possible states of the system 
(or more precisely, possible states of my detector inter- 
acting with the system) can be eliminated by using some 
knowledge of the physics of the system. For example, 
we may know some initial data or averages that char- 
acterize the system. This becomes clear in particular if 
the degrees of freedom that we choose to characterize the 
system with are position, momentum, and energy, i.e., if 
we consider the thermodynamical entropy of the system 
(see below). In this respect it is instructive to consider 
for a moment the continuous variable equivalent of the 
Shannon entropy, also known as the differential entropy, 
defined as [I] 

h(X) = - f f (x) log f(x)dx, (3) 
Js 

with a probability density function f(x) with support S. 
It turns out that while h(X) is invariant with respect 
to translations [h(X + c) = h(X)], it is not invariant 
under arbitrary coordinate transformations: the entropy 
is renormalized under such changes instead. For example, 

h(cX) =h{X)+ log |c| . (4) 

In particular, this implies that if we introduce a 
discretization of continuous space (e.g., via pi — 

J-^~ f(x)dx) and consider the limit of the discretized 
version as A — > 0, we find that 

H A \p i ] = -J2Pi l °SPi^ HX) -log A. (5) 

i 

Thus, as the resolution of a measurement device is in- 
creased, the entropy is renormalized via an infinite term. 
Of course, we are used to such infinite renormalizations 
from quantum field theory, and just as in the field the- 
ory case, the "unphysical" renormalization is due to an 
unphysical assumption about the attainable precision of 
measurements. Just as in quantum field theory, differ- 
ences of quantities do make sense: the shared (or mu- 
tual) differential entropy is finite in the limit A — > as 
the infinities cancel. 



B. Conditional Entropy 

Let us look at the basic process that reduces uncer- 
tainty: a measurement. When measuring the state of 
system X, I need to bring it into contact with a system 
Y. If Y is my measurement device, then usually I can 
consider it to be completely known (at least, it is com- 
pletely known with respect to the degrees of freedom I 
care about). In other words, my device is in a particular 
state yo with certainty. After interacting with X, this 
is not the case anymore. Let us imagine an interaction 
between the systems X and Y that is such that 

x l y -> Xiyi i = l,...,N, (6) 
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that is, the states of the measurement device yi end up 
reflecting the states of X. This is a perfect measurement, 
since no state of X remains unresolved. More generally, 
let X have N states while Y has M states, and let us 
suppose that M < N. Then we can imagine that each 
state of Y reflects an average of a number of X's states, 
so that the probability to find Y in state yj is given by 
qj, where qj — J2iPiji an( l Pij is the joint probability to 
find X in state x$ and Y in state yj. The measurement 
process then proceeds as 



XiVo -> (x)jVj 



where 



\j X i 



(7) 
(8) 



In Eq.(|8|) above, I introduced the conditional probability 



Pi\j 



1j 



(9) 



that X is in state i given that Y is in state j. In the 
perfect measurement above, this probability was 1 if i = j 
and otherwise (i.e., Pi\j = Sij), but in the imperfect 
measurement, X is distributed across some of its states 
i with a probability distribution p^j, for each j. 

We can then calculate the conditional entropy (or re- 
maining entropy) of the system X given we found Y in 
a particular state yj after the measurement: 



JV 



(10) 



This remaining entropy is guaranteed to be smaller 
than or equal to the unconditional entropy H(X), be- 
cause the worst case scenario is that Y doesn't resolve 
any states of X, in which case = pi. But since we 
didn't know anything about X to begin with, pi = 1/N, 
and thus H(X\Y = Vj ) < log N. 

Let us imagine that we did learn something from the 
measurement of X using Y, and let us imagine further- 
more that this knowledge is permanent. Then we can ex- 
press our new-found knowledge about X by saying that 
we know the probability distribution of X, pi, and this 
distribution is not the uniform distribution pi = 1/N. 
Of course, in principle we should say that this is a con- 
ditional probability Piu, but if the knowledge we have 
obtained is permanent, there is no need to constantly 
remind ourselves that the probability distribution is con- 
ditional on our knowledge of certain other variables con- 
nected with X. We simply say that X is distributed 
according to pi , and the entropy of X is 



ffactualW = ~} J logp* log^j 



(11) 



According to this strict view, all Shannon entropies of 
the form (11) are conditional if they are not maximal. 



And we can quantify our knowledge about X simply by 
subtracting this uncertainty from the maximal one: 



I — H max (X) — H actu al(X) 



(12) 



This knowledge, of course, is information. We can see 
from this expression that the entropy -ff max can be seen as 
potential information: it quantifies how much is knowable 
about this system. If my actual entropy vanishes, then 
all of the potential information is realized. 



Information 



In Eq. (12), we quantified our knowledge about the 



states of X by the difference between the maximal and 
the actual entropy of the system. This was a special case 
because we assumed that after the measurement, Y was 
in state yj with certainty, i.e, there is no remaining un- 
certainty associated with the measurement device Y (of 
course, this is appropriate for a measurement device). In 
a more general scenario where two random variables are 
correlated with each other, we can imagine that Y (after 
the interaction with X) instead is in state yj with proba- 
bility qj (in other words, we have reduced our uncertainty 
about Y somewhat, but we don't know everything about 
it, just as for X). We can then define the average condi- 
tional entropy of X simply as 



H(X\Y)=J2 qj H(X\Y 



Vj) 



(13) 



and the information that Y has about X is then the 
difference between the unconditional entropy H(X) and 
Eq. (13) above, 



H(X : Y) = H{X) - H{X\Y) 



(14) 



The colon between X and Y in the expression for the in- 
formation H(X : Y) is conventional, and indicates that 
it stands for an entropy shared between X and Y. Ac- 
cording to our strict definition of unconditional entropies 
given above, H(X) — log AT, but in the standard liter- 
ature H(X) refers to the actual uncertainty of X given 
whatever knowledge allowed me to obtain the probability 
distribution pi, that is, Eq. (11). In the case nothing is 



known a priori about X, Eq. (14) equals Eq. (12) 



Eq. ( 14 ) can be rewritten to display the symmetry be- 



tween the observing system and the observed: 

H{X : Y) = H{X) + H(Y) - H(XY) , (15) 

where H(XY) is just the joint entropy of both X and 
Y combined. This joint entropy would equal the sum of 
each of A's and Y's entropy only in the case that there 
are no correlations between A's and V's states. If that 
would be the case, we could not make any predictions 
about X just from knowing something about Y. The 
information (15), therefore, would vanish. 
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1. Example: Thermodynamics 

We can view Thermodynamics as a particular case of 
Shannon theory. First, if we agree that the degrees of 
freedom of interest are position and momentum, then the 
maximal entropy of any system is defined by its volume 
in phase space: 



-ffmax = log Ar , 



(16) 



where Ar = Ap ^ 9 is the number of states within the 
phase space volume ApAq. Now the normalization fac- 
tor k introduced in ( 16 ) clearly serves to coarse grain the 



number of states, and should be related to the resolution 
of our measurement device. In quantum mechanics, of 
course, this factor is given by the amount of phase space 
volume occupied by each quantum state, k — (2ith) n 
where n is the number of degrees of freedom of the sys- 
tem. Does this mean that in this case it is not my type of 
detector that sets the maximum entropy of the system? 
Actually, this is still true, only that here we assume a 
quantum mechanical perfect detector, while still averag- 
ing over certain internal states of the system inaccessible 
to this detector. 

Suppose I am contemplating a system whose maximum 



entropy I have determined to be Eq. ( 16 1, but I have some 
additional information. For example, I know that this 
system has been undisturbed for a long time, and I know 
its total energy E, and perhaps even the temperature T. 
Of course, this kind of knowledge can be obtained by a 
number of different ways. It could be obtained by ex- 
periment, or it could be obtained by inference, or theory. 
How does this knowledge reduce my uncertainty? In this 
case, we use our knowledge of physics to predict that the 
probabilities p{p 1 q) going into our entropy 



H(p,q) = - ^2 p(p,q) log p(p,q) 



is given by the canonical distribution 



p{p, q) 



1 



-E(p,q)/T 



(17) 



(18) 



where Z is the usual normalization constant, and the sum 



in (17) goes over all positions and momenta in the phase 



space volume ApAq. The amount of knowledge we have 
about the system according to Eq. (12 1 is then just the 



difference between the maximal and actual uncertainties: 



log 



(19) 



where E = ^2 
tern. 



Ar E 

~Z T ' 

ApAq P(P> q) E (P> q) is the energy of the sys- 



III. QUANTUM THEORY 

In quantum mechanics, the concept of entropy trans- 
lates very easily, but the concept of information is thorny. 



John von Neumann introduced his eponymous quantum 
mechanical entropy as early as 1927 [6], a full 21 years 
before Shannon introduced its classical limit! In fact, it 
was von Neumann who suggested to Shannon to call his 
formula ([I]) 'entropy', simply because, as he said, "your 
uncertainty function has been used in statistical mechan- 
ics under that name" [7]. 



A. Measurement 

In quantum mechanics, measurement plays a promi- 
nent role, and is still considered somewhat mysterious in 
many respects. The proper theory to describe measure- 
ment dynamics in quantum physics, not surprisingly, is 
quantum information theory. As in the classical theory, 
the uncertainty about a quantum system can only be de- 
fined in terms of the detector states, which in quantum 
mechanics is a discrete set of eigenstates of a measure- 
ment operator. The quantum system itself is described 
by a wave function, given in terms of the quantum sys- 
tem's eigenbasis, which may or may not be the same as 
the measurement device's basis. 

For example, say we would like to "measure an elec- 
tron" . In this case, we may mean that we would like to 
measure the position of an electron, whose wave function 
is given by ^f(q), where q is the coordinate of the electron. 
Further, let the measurement device be characterized ini- 
tially by its eigenfunction c/jq (£) , where £ may summarize 
the coordinates of the device. Before measurement, i.e., 
before the electron interacts with the measurement de- 
vice, the system is described by the wave function 



*(q)MO 



(20) 



After the interaction, the wave function is a superposition 
of the eigenfunctions of electron and measurement device 



^2^n(q)<f>n{0 



(21) 



Following orthodox measurement theory |5] , the classical 
nature of the measurement apparatus implies that after 
measurement the "pointer" variable £ takes on a well- 
defined value at each point in time; the wave function, as 
it turns out, is thus not given by the entire sum in (21 ) 
but rather by the single term 



i> n {q)4>n{£) 



(22) 



The wave function (21 ) is said to have collapsed to \22 



Let us now study what actually happens in such a mea- 
surement in detail. For ease of notation, let us recast this 
problem into the language of state vectors instead. The 
first stage of the measurement involves the interaction of 
the quantum system Q with the measurement device (or 
"ancilla") A. Both the quantum system and the ancilla 
are fully determined by their state vector, yet, let us as- 
sume that the state of Q (described by state vector |a;)) 
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is unknown whereas the state of the ancilla is prepared in 
a special state |0), say. The state vector of the combined 
system \QA) before measurement then is 



|tf t=0 > = \x)\0) = \x,0) 



(23) 



The von Neumann measurement [9] is described by the 
unitary evolution of QA via the interaction Hamiltonian 



H 



-X Q P A , 



(24) 



operating on the product space of Q and A. Here, Xq is 
the observable to be measured, and Pa the operator con- 
jugate to the degree of freedom of A that will reflect the 
result of the measurement. We now obtain for the state 
vector \QA) after measurement (e.g., at t = 1, putting 
h = l) 

|**=i) = e i±c * pA \x, 0) = e lxpA \x, 0) = \x, x) . (25) 

Thus, the pointer variable in A that previously pointed 
to zero now also points to the position x that Q is in. 
This operation appears to be very much like the classi- 
cal measurement process Eq. ([6|, but it turns out to be 



quite different. In general, the unitary operation (25 



troduces quantum entanglement between the system be- 
ing measured and the measurement apparatus, a concept 
that is beyond the classical idea of correlations. 

That entanglement is very different from correlations 
becomes evident if we apply the unitary operation de- 
scribed above to an initial quantum state which is in a 
quantum superposition of two states: 

|*t=o) = |a: + »,0). (26) 
Then, the linearity of quantum mechanics implies that 



I** 



iX Q P M 



\x,0) + \y,0) = \x,x) + \y,y) . (27) 



This state is very different from what we would expect in 
classical physics, because Q and A are not just correlated 
(like, e.g., the state \x + y, x + y) would be) but rather 
they are quantum entangled. They now form one system 
that cannot be thought of as composite. 

This nonseparability of a quantum system and the de- 
vice measuring it is at the heart of all quantum myster- 
ies. Indeed, it is at the heart of quantum randomness, 
the puzzling emergence of unpredictability in a theory 
that is unitary, i.e., where all probabilities are conserved. 
What is being asked here of the measurement device, 
namely to describe the system Q, is logically impossi- 
ble because after entanglement the system has grown to 
QA. Thus, the detector is being asked to describe a sys- 
tem that is larger (with respect to the possible number 
of states) than the detector, because it includes the de- 
tector itself. This is precisely the same predicament that 
befalls a computer program that is asked to determine 
its own halting probability in the famous Halting Prob- 
lem [TO] analogue of Godel's famous Incompleteness The- 
orem [TT]. Chaitin [TJ] showed that the self-referential 



nature of the question that is posed to a computer pro- 
gram written to solve the Halting Problem gives rise to 
randomness in pure Mathematics: the halting probability 
halts ^ > where the sum goes over all the pro- 
grams p that halt and \p\ is the size of those programs, is 
random in every way that we measure randomness [13] . 
A quantum measurement is self-referential in the same 
manner, since the detector is asked to describe its own 
state, which is logically impossible. Thus we see that 
quantum randomness has mathematical, or rather logi- 
cal, randomness at its very heart. 



B. von Neumann Entropy 

Because of the uncertainty inherent in standard projec- 
tive measurements, measurements of a quantum system 
Q are described as expectation values, which are averages 
of an observable over the system's density matrix, so that 



(6) = Tr( PQ d) , 



(28) 



where O is an operator associated with the observable we 
would like to measure, and 



PQ = Tt a \^qa)(^qa\ 



(29) 



is obtained from the quantum wave function ^qa (for the 
combined system QA, since neither Q nor the measure- 
ment device A separately have a wave function after the 
entanglement occurred) by tracing out the measurement 
device. However, technically, we are observing the states 
of the detector, not the states of the quantum system, so 
instead we need to obtain 



p A = Ttq\^qa)(^qa\ 



(30) 



by averaging over the states of the quantum system 
(which strictly speaking is not being observed) and the 
expectation value of the measurement is instead 



(O) = Tt(paO) 



(31) 



The uncertainty about the quantum system is then as- 
sumed to be given by the uncertainty in the measure- 
ment device A, and can be calculated simply using the 
von Neumann entropy (HJ [3] : 



-Trp^log p A 



(32) 



If Q has been measured in ^4's eigenbasis, then the 
density matrix pa is diagonal, and von Neumann entropy 
turns into Shannon entropy, as we expect. Indeed, mea- 
suring with respect to the system's eigenbasis is precisely 
the classical limit: entanglement does not happen under 
these conditions. 

Quantum Information Theory, of course, needs con- 
cepts such as conditional entropies and mutual entropies 
besides the von Neumann entropy. They can be defined 
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in a straightforward manner [T21[T5], but their interpreta- 
tion needs care. For example, we can define a conditional 
entropy in analogy to Shannon theory as 

S(A\B) = S(AB)-S(B) (33) 

= -Tr AB (pab log pab) + Tr s (ps log pb) , 

where S(AB) is the joint entropy of two systems A and 
B. But can we write this entropy in terms of a condi- 
tional density matrix, just as we were able to write the 
conditional Shannon entropy in terms of a conditional 
probability? The answer is yes and no: a definition of 
conditional von Neumann entropy in terms of a condi- 
tional density operator pa\b exists [TSJ |TB] , but this op- 
erator is technically not a density matrix (its trace is not 
equal to one), and the eigenvalues of this matrix are very 
peculiar: they can exceed one (this is of course not possi- 
ble for probabilities). Indeed, the eigenvalues can exceed 
one only when the system is entangled. As a consequence, 
quantum conditional entropies can be negative |15j . This 
negative quantum entropy has an operational meaning in 
quantum information theory: it quantifies how much ad- 
ditional information must be conveyed in order to trans- 
port a quantum state if part of a distributed quantum 
system is known |17j . If this "partial information" is 
negative, the sender and receiver can use the states for 
future communication. In Fig. [T^,, we can see a quantum 
communication process known as "quantum teleporta- 
tion" |18j , in which the quantum wavcfunction of a qubit 
(the quantum analogue of the usual bit, which is a quan- 
tum particle that can exist in superpositions of zero and 
one) is transported from the sender "A" (often termed 
"Alice") to the receiver "B" (conventionally known as 
"Bob"). This can be achieved using an entangled pair of 
particles ee (an ebit-anti-ebit pair), where "ebit" stands 
for entangled bit |19j . This pair carries no information, 
but each element of the pair carries partial information: 
in this case the ebit carries one bit, while the anti-ebit 
carries minus one bit. Bob sends the ebit over to Alice, 
who performs a joint measurement M of the pair and 
sends the two classical bits of information back to Bob 
(see Fig. [jj,). Armed with the two classical bits, Bob in 
turn can now perform a unitary operation U on the anti- 
ebit he has been carrying around, and transform it into 
the original qubit that Alice had intended to convey. In 
this manner, Bob has used the negative "partial informa- 
tion" in his anti-ebit to recover the full quantum state, 
using only classical information. Note that the anti-ebit 
with negative partial information traveling forwards in 
time can be seen as an ebit with positive partial infor- 
mation traveling backwards in time |T5] . The process 
of super-dense coding |20) can be explained in a similar 
manner (see Fig. [Tja) , except here Alice manages to send 
2 classical bits by encoding them on the single anti-ebit 
she received from Bob. 

Quantum mutual entropy is perhaps even harder to 
understand. We can again define it simply in analogy to 

PI as [a cna eo 



(a) 



(b) 



M 





M 



S(A : B) = S{A) + S{B) - S{AB) , 



(34) 



FIG. 1: Using negative partial information for quantum com- 
munication, (a) In these diagrams, time runs from top to 
bottom, and space is horizontal. The line marked "A" is Al- 
ice's space-time trajectory, while the line marked "B" is Bob's. 
Bob creates an ee pair (an Einstein-Podolski-Rosen pair) close 
to him, and sends the ebit over to Alice. Alice, armed with 
an arbitrary quantum state q, performs a joint measurement 
M on both e and q, and sends the two classical bits 2c she 
obtains from this measurement back to Bob (over a classical 
channel). When Bob receives these two cbits, he performs 
one out of four unitary transformations U on the anti-ebit 
he is still carrying, conditionally on the classical information 
he received. Having done this, he recovers the original quan- 
tum state q, which was "teleported" over to him. The partial 
information in e is one bit, while it is minus one for the anti- 
ebit. (b) In superdense coding, Alice sends two classical bits 
of information 2c over to Bob, but using only a single qubit in 
the quantum channel. This process is in a way the "dual" to 
the teleportation process, as Alice encodes the two classical 
bits by performing a conditional unitary operation U on the 
anti-ebit, while it is Bob that performs the measurement M 
on the ebit he kept and the qubit Alice sent. Figure adapted 
from [15] . 



but what does it mean? For starters, this quantum mu- 
tual entropy can be twice as large as the entropy of any 
of the subsystems, so A and B can share more quantum 
entropy than they even have by themselves! Of course, 
this is due to the fact, again, that "selves" do not exist 
anymore after entanglement. Also, in the classical the- 
ory, information, that is, shared entropy, could be used 
to make predictions, and therefore to reduce the uncer- 
tainty we have about the system that we share entropy 
with. But that's not possible in quantum mechanics. If, 
for example, I measure the spin of a quantum particle 
that is in an even superposition of its spin-up and spin- 
down state, my measurement device will show me spin-up 
half the time, and spin-down half the time, that is, my 
measurement device has an entropy of one bit. It can 
also be shown that the shared entropy is two bits [15] . 
But this shared entropy cannot be used to make predic- 
tions about the actual spin. Indeed, for the case of the 
even superposition, I still do not know anything about 
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it [22] ! On the other hand, it is possible, armed with my 
measurement result, to make predictions about the state 
of other detectors measuring the same spin. And even 
though all these detectors will agree about their result, 
technically they agree about a random variable (the state 
of the measurement device), not the actual state of the 
spin they believe their measurement device to reflect |23| . 
Indeed, what else could they agree on, since the spin does 
not have a state? Only the combined system with all the 
measurement devices that have ever interacted with it, 
does [23]. 

Still, the quantum mutual entropy plays a central role 
in quantum information theory, because it plays a similar 
role as the classical mutual entropy in the construction of 
the capacity of an entanglement-assisted channel [2"S"ll26| . 
In this respect, it is unsurprising that the mutual entropy 
between two qubits can be as large as 2, as this is the 
capacity of the superdense coding channel described in 

Fig.^ES]. 

The extension of Shannon's theory into the quantum 
regime not only throws new light on the measurement 
problem, but it also helps in navigating the boundary 
between classical and quantum physics. According to 
standard lore, quantum systems (meaning systems de- 
scribed by a quantum wave function) "become" classical 
in the macroscopic limit, that is, if the action unit asso- 
ciated with that system is much larger than H. Quantum 
information theory has thoroughly refuted this notion, 
since we now know that macroscopic bodies can be en- 
tangled just as microscopic ones can [27]. Instead, we 
realize that quantum systems appear to follow the rules 
of classical mechanics if parts of their wave function are 
averaged over [such as in Eq. (29)], that is, if the ex- 
perimenter is not in total control of all the degrees of 
freedom that make up the quantum system. Because 
entanglement, once achieved, is not undone by the dis- 
tance between entangled parts, almost all systems will 
seem classical unless expressly prepared, and then pro- 
tected from interaction with uncontrollable quantum sys- 
tems. Unprotected quantum systems spread their state 
over many variables very quickly: a process known as 
decoherence of the quantum state |28j . 



inverse temperature) 



e * 



P12 



/ _M 

/ e 2 



V 



cosh ^ - sinh ^ 
- sinh ^ cosh ^ 

j 
e 2 



\ 













where Z = Tre l3H = e 3 * + 3e ""•» . We can calculate 
the von Neumann entropy of the joint system as 

S{p\2) = -Trpi2 logpi2 = \ogZ + f3E , 

where E is the energy 



(36) 



E = Tr p 12 H = 



3 J 1 



4 3 + eP J ' 



(37) 



The marginal density matrices for each of the spin sub- 
systems turn out to be 



Pi = P2 



1 
1 



as can easily be seen from inspecting p 12 above, so that 
S(pi) = S(p2)= 1. Using (34 1 we can calculate the mu- 



tual entropy between the quantum subsystems to find 



(38) 



5(1 : 2) = 2 -logZ- PE 



which is formally analogous to the classical result ([19]), 
but has very peculiar quantum properties instead. In 
the infinite temperature limit p — > we see that Z — >• 4 
while E — > 0, so the shared entropy vanishes in that limit 
as it should: no interactions can be maintained. But 
it is clear that at any finite temperature, the quantum 
interaction between the spins creates correlations that 
can be quantified by the mutual von Neumann entropy 
between the spins. In particular, in the limit of zero 
temperature we find 



logZ 



PE — > 



(39) 



1. Quantum Thermodynamics 

A simple example that illustrates the use of informa- 
tion theory in (quantum) thermal physics is the Heisen- 
berg dimer model, defined by the Hamiltonian 

H = Js\ ®s 2 — -to <8> u , (35) 

where a — {<r x , a y , o~ z ) are the Pauli matrices. The sys- 
tem has three degenerate excited states with energy J/4, 
and a (singlet) ground state with energy — 3J/4. The 
thermal density matrix of the two-spin system can be 
written in the product basis as (here, P = 1/T is the 



that is, the joint entropy of the spins 5(pi 2 ) vanishes and 
5(1 : 2) 2. In that case, the mutual von Neumann 
entropy is that of a pure Einstein-Podolski-Rosen pair: 
the singlet solution 

l*> = ^(ltt>-Ut» , (40) 

and exceeds by a factor of two the entropy of any of the 
spins it is composed of. We recognize the wavefunction 
of the ground state of the Heisenberg dimer at zero tem- 
perature as the entangled ee pair that we encountered 
earlier, and that was so useful in quantum teleportation 
and superdense coding. We will study its behavior under 
Lorentz transformations below. 
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IV. RELATIVISTIC THEORY 

Once convinced that information theory is a statistical 
theory about the relative states of detectors in a physi- 
cal world, it is clear that we must worry not only about 
quantum detectors, but about moving ones as well. Ein- 
stein's special relativity established an upper limit for the 
speed at which information can be transmitted, without 
the need to cast this problem in an information-theoretic 
language. But in hindsight, it is clear that the impossibil- 
ity of superluminal signaling could just as well have been 
the result of an analysis of the information transmission 
capacity of a communication channel involving detectors 
moving at constant speed with respect to each other. As 
a matter of fact, Jarett and Cover calculated the capac- 
ity of an "additive white noise Gaussian" (AWNG) chan- 
nel [4] for information transmission for the case of moving 
observers, and found l29l 



C = W log(l + ccSNR) 



(41) 



where W is the bandwidth of the channel, SNR is the 
signal-to-noise ratio, and a — v' jv is the Doppler shift. 
As the relative velocity v/c — > 1, a ~ > and the commu- 
nication capacity vanishes. In the limit a = 1, the com- 
mon capacity formula for the common Gaussian channel 
with limited bandwidth [3] is recovered. Note that in the 
limit of an infinite bandwidth channel, Eq. (41 ) becomes 



C = aSNRlog 2 (e) bits per second 



(42) 



Historically, this calculation seems to have been an 
anomaly: no-one else seems to have worried about an 
"information theory of moving bodies" , not the least be- 
cause such a theory had, or indeed has, little immedi- 
ate relevance. Interestingly the problem that Jarett and 
Cover addressed with their calculation was the fa- 
mous "twin-paradox" : a thought experiment in special 
relativity that involves a twin journeying into space at 
high-speed, only to turn around to find that his identical 
twin that stayed behind has aged faster. Relativistic in- 
formation theory gives a nice illustration of the resolution 
of the paradox, where the U-turn that the traveling twin 
must undergo creates a switch in reference frames that 
affects the information transmission capacities between 
the twins, and accounts for the differential aging. 

A standard scenario that would require relativistic 
information theory thus involves two random variables 
moving with respect to each other. The question we 
may ask is whether relative motion is going to affect any 
shared entropy between the variables. First, it is im- 
portant to point out that Shannon entropy is a scalar, 
and we therefore do not expect it to transform under 
Lorentz transformations. This is also intuitively clear if 
we adopt the "strict" interpretation of entropy as being 
unconditional (and therefore equal to the logarithm of 
the number of degrees of freedom). On the other hand, 
probability distributions (and the associated conditional 



entropies) could conceivably change under Lorentz trans- 
formations. How is this possible given the earlier state- 
ment that entropy is a scalar? 

We can investigate this with a gedankenexperiment 
where the system under consideration is an ideal gas, 
with particle velocities distributed according to the 
Maxwell distribution. In order to define entropies, we 
have to agree about which degrees of freedom we are in- 
terested in. Let us say that we only care about the two 
components of the velocity of particles confined in the 
x— y-plane. Even at rest, the mutual entropy between the 
particle velocity components H(v x : v y ) is non- vanishing, 
due to the finiteness of the magnitude of v. A detailed 
calculation 63 using continuous variable entropies of the 
Maxwell distribution shows that, at rest 



H(v x : v y ) = log(7r/e) 



(43) 



The Maxwell velocity distribution, on the other hand, 
will surely change under Lorentz transformations in, say, 
the x-direction, because clearly the components are af- 
fected differently by the boost. In particular, it can be 
shown that the mutual entropy between v x and v y will 
rise monotonically from log(7r/e), and tend to a constant 
value as the boost- velocity v/c — > 1. But of course, v/c is 
just another variable characterizing the moving system, 
and if this is known precisely, then we ought to be able 
to recover Eq. ( 43 ) , and the apparent change in infor- 



mation is due entirely to a reduction in the uncertainty 
H(v x ). This example shows that in information theory, 
even if the entire system's entropy does not change un- 
der Lorentz transformations, the entropies of subsystems, 
and therefore also information, can. 

While a full theory of relativistic information does not 
exist, pieces of such a theory can be found when digging 
through the literature, For example, relativistic thermo- 
dynamics is a limiting case of relativistic information the- 
ory, simply because as we have seen above, thermody- 
namical entropy is a limiting case of Shannon entropy. 
But unlike in the case constructed above, we do not have 
the freedom to choose our variables in thermodynamics. 
Hence, the invariance of entropy under Lorentz transfor- 
mations is assured via Liouville's theorem, because the 
latter guarantees that the phase-space volume occupied 
by a system is invariant. Yet, relativistic thermodynam- 
ics is an odd theory, not the least because it is intrin- 
sically inconsistent: the concept of equilibrium becomes 
dubious. In thermodynamics, equilibrium is defined as a 
state where all relative motion between the subsystems of 
an ensemble have ceased. Therefore, a joint system where 
one part moves with a constant velocity with respect to 
the other cannot be at equilibrium, and relativistic infor- 
mation theory has to be used instead. 

One of the few questions of immediate relevance that 
relativistic thermodynamics has been able to answer is 
how the temperature of an isolated system will appear 
from a moving observer. Of course, temperature itself is 
an equilibrium concept and therefore care must be taken 
in framing this question [30: . Indeed, both Einstein |31j 
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and Planck [32] tackled the question of how to Lorentz- 
transform temperature, with different results. The con- 
troversy 133] can be resolved by realizing that no such 
transformation law can in fact exist [35] . as the usual 
temperature (the parameter associated with the Planck- 
ian blackbody spectrum) becomes direction-dependent if 
measured with a detector moving with velocity /3 = v/c 
and oriented at an angle 9' with respect to the radia- 
tion [31 E7j 



r = 



1 - p cos 9' 



(44) 



In other words, an ensemble that is thermal in the rest 
frame is non-thermal in a moving frame, and in particular 
cannot represent a standard heat bath because it will be 
non-isotropic. 



V. RELATIVISTIC QUANTUM THEORY 

While macroscopic quantities like temperature lose 
their meaning in relativity, microscopic descriptions in 
terms of probability distributions clearly still make sense. 
But in a quantum theory, these probability distributions 
are obtained from quantum measurements specified by 
local operators, and the space-time relationship between 
the detectors implementing these operators becomes im- 
portant. For example, certain measurements on a joint 
(i.e., composite) system may require communication be- 
tween parties, while certain others are impossible even 
though they do not require communication |38j . In gen- 
eral, a relativistic theory of quantum information needs 
to pay close attention to the behavior of the von Neu- 
mann entropy under Lorentz transformation, and how 
such entropies are being reduced by measurement. In 
this section, I discuss the effect of a Lorentz transforma- 
tion on the entropy of a single particle, or a pair of en- 
tangled particles. For the latter case, I study how quan- 
tum entanglement between particles is affected by global 
Lorentz boosts. This formalism has later been used to 
study the effect of local Lorentz transformations on the 
von Neumann entropy of a single particle or a pair of 
entangled particles, and I will summarize those results 
too. 



A. Boosting Quantum Entropy 

The entropy of a qubit (which we take here for sim- 
plicity to be a spin-1/2 particle) with wave function 



I*) 



a\ t) + b\ I) 



(45) 



(a and b are complex numbers), can be written in terms 
of its density matrix p = |\P)(vP| as 



A wave function is by definition a completely known state 
(called a "pure state"), because the wave function is a 
complete description of a quantum system. As a con- 
sequence, (46) vanishes: we have no uncertainty about 



this quantum system. As we have seen earlier, it is when 
that wave function interacts with uncontrolled degrees 
of freedom that mixed states arise. And indeed, just by 
Lorentz-boosting a qubit, such mixing will arise |39j . The 
reason is not difficult to understand. The wave function 
( |45"| ), even though I have just stated that it completely 
describes the system, in fact only completely describes 
the spin degree of freedom! Just as we saw in the ear- 
lier discussion about the classical theory of information, 
there may always be other degrees of freedom that our 
measurement device (here, a spin-polarization detector) 
cannot resolve. Because we are dealing with particles, 
ultimately we have to consider their momenta. A more 
complete description of the qubit state then is 



l*> = W) x \P) 



(47) 



where a stands for the spin- variable, and p is the parti- 
cle's momentum. Note that the momentum wave func- 
tion \p) is in a product state with the spin wave function 
\a). This means that both spin and momentum have 
their own state, they are unmixed. But as is taught 
in every first-year quantum mechanics course, such mo- 
mentum wave functions (plane waves with perfectly well- 
defined momentum p) do not actually exist; in reality, 
they are wave packets with a momentum distribution 
f(p), which we may take to be Gaussian. If the system 
is at rest, the momentum wave function docs not affect 



the entropy of (47 1, because it is a product. 



What happens if the particle is boosted? The spin 
and momentum degrees do mix, which we should have 
expected because Lorentz transformations always imply 
frame rotations as well as changes in linear velocity. The 
product wave function ( 47 1 then turns into 



(48) 



which is a state where spin-degrees of freedom and mo- 
mentum degrees of freedom are entangled. But our spin- 
polarization detector is insensitive to momentum! Then 
we have no choice but to average over the momentum, 
which gives rise to a spin density matrix that is mixed 



Tr^( | *)(*|) , 



(49) 



S(p) = -Tr(plogp) . 



(46) 



and that consequently has positive entropy. Note, how- 
ever, that the entropy of the joint spin-momentum den- 
sity matrix remains unchanged, at zero. Note also that if 
the momentum of the particle was truly perfectly known 
from the outset, i.e., a plane wave \p), mixing would also 
not take place [40] . 

While the preceding analysis clearly shows what hap- 
pens to the quantum entropy of a spin-1/2 particle un- 
der Lorentz transformations (a similar analysis can be 
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done for photons [H]), what is most interesting in quan- 
tum information theory is the entanglement between sys- 
tems. While some aspects of entanglement are captured 
by quantum entropies |42) and the spectrum of the condi- 
tional density operator |16j . quantifying entanglement is 
a surprisingly hard problem, currently without a perfect 
solution. However, some good measures exist, in par- 
ticular for the entanglement between two-level systems 
(qubits) and three-or-fewer level systems |43j . 



B. Boosting Quantum Entanglement 

If we wish to understand what happens to the entan- 
glement between two massive spin-1/2 particles, say, we 
have to keep track of four variables, the spin states \a) 
and | A) and the momentum states \p) and \q). A Lorentz 
transformation on the joint state of this two-particle sys- 
tem will mix spins and momenta just as in the previous 
example. Let us try to find out how this affects entan- 
glement. 

A good measure for the entanglement of mixed states, 
i.e., states that are not pure such as ( [47] ), is the so-called 
concurrence, introduced by Wootters i[44J . This concur- 
rence C(pab) can be calculated for a density matrix pab 
that describes two subsystems A and B of a larger sys- 
tem, and quantifies the entanglement between A and B. 
For our purposes, we will be interested in the entangle- 
ment between the spins a and A of our pair. The con- 
currence is one if two degrees of freedom are perfectly 
entangled, and vanishes if no entanglement is present. 

In order to do this calculation we first have to specify 
our initial state. We take this to be a state with spin 
and momentum wave function in a product, but where 
the spin-degrees of freedom are perfectly entangled in a 
so-called Bell state: 



1 



V2 



(It,4)-U,t» 



(50) 



The concurrence of this state can be calculated to be 
maximal: C(p a \) — 1. We now apply a Lorentz boost 
to this joint state, i.e., we move our spin-polarization de- 
tector with speed (3 = v/c with respect to this pair (or, 
equivalently, we move the pair with respect to the detec- 
tor) . If the momentum degrees of freedom of the particles 
at the outset are Gaussian distributions unentangled with 
each other and the spins, the Lorentz boost will entan- 
gle them, and the concurrence will drop [45] . How much 
it drops depends on the ratio between the spread of the 
momentum distribution ay (not to be confused with the 
spin a) and the particle's mass m. In Fig. [2] below, the 
concurrence is displayed for two different such ratios, as 
a function of the rapidity £. The rapidity £ here is just 
a transformed velocity: £ = sinh /3, such that £ — > oo as 
/3 — > 1. We can see that if the ratio oy/m is not too 
large, the concurrence will drop but not disappear alto- 
gether. But if the momentum spread is large compare to 
the mass, all entanglement can be lost. 
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FIG. 2: Spin-concurrence as a function of rapidity, for an 
initial Bell state with momenta in a product Gaussian. Data 
is shown for a r /m — 1 and o r /m — 4 (from Ref. [45]). 



Let us consider instead a state that is initially unentan- 
gled in spins, but fully entangled in momenta. I depict 
such a wave function in Fig. [3| where a pair is in a super- 
position of two states, one moving in opposite directions 
with momentum p± in a relative spin state (this is 
one of the four Bell spin-entangled states, Eq. (p50|), and 
one moving in a plane in opposite orthogonal directions 
with momentum p, in a relative spin-state $ + (which 
is Eq. ( 50 ) but with a plus sign between the superposi- 



tions). It can be shown that if observed at rest, the spins 
are indeed unentangled. But when boosted to rapidity £, 
the concurrence actually increases [JS]> as for this state 
(choosing m = 1) 



C(p 



AB) = 



p 2 (cosh 2 (£) - 1) 

(yr+FcosMo + i) 2 



(51) 



Thus, Lorentz boosts can, under the right circumstances, 
create entanglement where there had been none before. 

A similar analysis can be performed for pairs of en- 
tangled photons, even though the physics is quite dif- 
ferent [15] . First of all, photons are massless and their 
quantum degree of freedom is the photon polarization. 
The masslessness of the photons makes the analysis a bit 
tricky, because issues of gauge invariance enter into the 
picture, and as all particles move with constant veloc- 
ity (light speed), there cannot be a spread in momentum 
as in the massive case. Nevertheless, Lorentz transforma- 
tion laws acting on polarization vectors can be identified, 
and an analysis similar to the one described above can be 
carried through. The difference is that the entangling ef- 
fect of the Lorentz boost is now entirely due to the spread 
in momentum direction between the two entangled pho- 
ton beams. This implies first of all that fully-entangled 
photon polarizations cannot exist, even at rest, and sec- 
ond that existing entanglement can either be decreased 
or increased, depending on the angle with which the pair 
is boosted (with respect to the angle set by the entangled 
pair), and the rapidity 46J. 
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FIG. 3: Superposition of Bell-states $ + and $ at right an- 
gles, with the particle pair moving in opposite directions. 



C. Entanglement in Accelerated Frames 

The logical extension of the work just described is to 
allow for local Lorentz transformations on quantum par- 
ticles, that is, to study particles on relativistic (acceler- 
ating) orbits or in classical gravitational fields. Alsing 
and Milburn, for example, studied the quantum telepor- 
tation channel I discussed earlier |47j . Because quantum 
teleportation relies on an entangled pair of particles, the 
fidelity of quantum teleportation (how well Bob's ver- 
sion of Alice's quantum state agrees with the original) 
would suffer if acceleration of either Bob or Alice leads 
to a deterioration of entanglement. This is precisely what 
happens, but the origin of the deterioration of entangle- 
ment is different here: it is not due to the mixing of spin 
and momentum degrees of freedom, but rather due to 
the emergence of "Unruh radiation" in the rest frame of 
the accelerated observer [37J BE] ■ Unruh radiation is a 
peculiar phenomenon that is due to the appearance of a 
sort of "event horizon" for accelerated observers: there 
are regions of spacetime that are causally disconnected 
from an accelerated observer, and this disconnected re- 
gion affects the vacuum fluctuations that occur anywhere 
in space [35115 1| . In this sense, the Unruh radiation is 
analogous to Hawking radiation, which I will discuss in 
more detail in the following section. Unruh radiation 
produces thermal noise in the communication channel, 
which leads to the breakdown of the fidelity of quantum 
teleportation. Because this reasoning applies to all quan- 
tum communication that relies on the assistance of en- 
tanglement, we can conclude that generally the capacity 
of entanglement-assisted channels would be reduced be- 
tween accelerated observers [55]. A similar conclusion 
holds for entangled particles near strong gravitational 
fields 53J. In that case, it is indeed the Hawking ra- 
diation that leads to the deterioration of entanglement 
between Einstein-Podolski-Rosen pairs. 



VI. INFORMATION IN CURVED SPACE TIME 

While there are clearly many other questions that can 
conceivably be posed (and hopefully answered) within 
the relatively new field of relativistic quantum informa- 
tion theory [53], I would like to close this review with 
some speculations about quantum information theory in 
curved space time. 

That something interesting might happen to entropies 
in curved space time has been suspected ever since the 
discovery of Hawking radiation [55] that gave rise to the 
black hole information paradox [56 . The paradox has 
two parts and can be summarized as follows: According 
to standard theory, a non-rotating and uncharged black 
hole can be described by an entropy that is determined 
entirely in terms of its mass M (in units where h — G = 



s 



BH 



j2 



(52) 



Presumably, a state that is fully known (that is, one that 
is correlated with another system that an observer has in 
its possession) can be absorbed by the black hole. Once 
that state disappears behind the event horizon, the cor- 
relation between that state and its description in the ob- 
server's hands seems to disappear: the information can- 
not be retrieved any longer. Even worse, after a long 
time, the black hole will have evaporated entirely into 
thermal (Hawking) radiation, and the information is not 
only irretrievable, it must have been destroyed. A more 
technical discussion would argue that black holes appear 
to have the capability to turn pure states into mixed 
stated without disregarding (tracing over) parts of the 
wave function. Such a state of affairs is not only para- 
doxical, but it is in fact completely incompatible not only 
with the standard model of physics, but with the basic 
principle of probability conservation. 

The second part of the problem has to do with the 
entropy balance between the black hole and the radiation 
it emits. When the black hole evaporates via Hawking 
radiation, the emitted radiation is thermal, and carries 
entropy 



'rad 



T, 



H 



(53) 



with black hole temperature Tjj — (87rM) _1 . But the 
black hole's entropy must also change at the same time, 
and this is determined by the amount of energy that had 
to be spent in order to create the virtual particle pairs 
that gave rise to the radiation. Because mass and temper- 
ature of the black hole are inversely related, the entropy 
decrease of the black hole and the entropy of the emitted 
radiation cannot match. Indeed, we roughly find that 



dS iad w 4/3 dS BH 



(54) 



Now it should be pointed out that the preceding re- 
sults were obtained within equilibrium thermodynamics 
in curved space time. But since black holes have negative 
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heat capacity |57j , they can never be at equilibrium, and 
the assumptions of that theory are strongly violated. As 
the concept of information itself is a non-equilibrium one, 
we should not be surprised if paradoxical results are ob- 
tained if equilibrium concepts are used to describe such 
a case. Still, a resolution of the microscopic dynamics 
in black hole evaporation is needed. One possible ap- 
proach is to use quantum information theory to charac- 
terize the relative states of the black hole, the stimulated 
radiation emitted during the formation of the black hole, 
and the Hawking radiation (spontaneous emission of ra- 
diation) created in the subsequent evaporation [231 EE] • 
As we have lost track of the stimulated radiation, wc 
must always average over it ("trace" it out), which (along 
with tracing out the causally disconnected region that 
lies beyond the Schwarzschild radius) creates the posi- 
tive black hole entropy. In the flat space-time treatment 
of Ref . [23] , the entropy balance between the black hole 
and the Hawking radiation can be maintained because 
entanglement is spread between the stimulated radiation, 
Hawking radiation, and the black hole. While all three 
are strongly entangled, tracing over the stimulated radia- 
tion produces a state of no correlations between Hawking 
radiation and black hole, implying that the Hawking ra- 
diation appears purely thermal. But of course, the joint 
system is still highly entangled, but in order to discover 
this entanglement we would have to have access to the 
lost radiation emitted during the formation process. Still, 
this treatment is unsatisfying because it does not resolve 
the ultimate paradox: the unitary description only works 
up until the black hole has shrunk to a particular small 
size. At that point it appears to break down. 

One reason for this breakdown might lie in the inap- 
propriate treatment of quantum entropy in curved space 
time (the preceding formalism ignored curvature). A 
more thorough analysis must take into account the causal 
structure of space time. For example, not all quantum 
measurements are realizable 38J , because only those vari- 
ables can be simultaneously measured whose separation 
is space-like. In physics, we do have a theory that cor- 
rectly describes how different observables interact in a 
manner compatible with the causal structure of space- 
time, namely quantum field theory. In order to con- 
sistently define quantum entropies then, we must define 
them within quantum field theory in curved space-time. 

The first steps toward such a theory involve defining 
quantum fields over a manifold separated into an acces- 
sible and an inaccessible region. This division will occur 
along a world-line, and we shall say that the "inside" 
variables are accessible to me as an observer, while the 
outside ones are not. Note that the inaccessibility can be 
due either to causality, or due to an event horizon. Both 
cases can be treated within the same formalism. States 
in the inaccessible region have to be averaged over, since 
states that differ only in the outside region are unresolv- 
able. Let me denote the inside region by R, while the 
entire state is defined on E. We can now define a set of 
commuting variables X that can be divided into X- m and 



X out . By taking matrix elements of this density matrix 
of the entire system 



p=\E){E\ 



(55) 



with the complete set of variables (Ai n ,A out ), we can 
construct the inside density matrix (defined on R) as 



Tr x out (px in x out ) 



(56) 



which allows me to define the geometric entropy |59| of a 
state E for an observer restricted to R 



-Tr(p in log pi, 



(57) 



Here, the trace is performed using the inside variables 
only. 

This, however, is just the beginning. As with most 
quantities in quantum field theory, this expression is di- 
vergent and needs to be renormalized. Rather than being 
an inconvenience, this is precisely what we should have 
expected: after all, we began this review by insisting 
that entropies only make sense when discussed in terms 
of the possible measurements that can be made to this 
system. This is, of course, precisely the role of renormal- 
ization in quantum field theory. Quantum entropies can 
be renormalized via a number of methods, either using 
Hawking's zeta function regularization procedure [60] or 
by the "replica trick" , writing 



'5 p- 



dn 



Tr(ft"„) 



(58) 



n=l 



and then writing dS(n) in terms of the expectation value 
of the stress tensor. A thorough application of this pro- 
gram should reveal components of the entropy due en- 
tirely to the curvature of space-time, and which vanish in 
the flat-space limit. Furthermore, the geometric entropy 
can be used to write equations relating the entropy of the 
inside and the outside space-time regions, as 



S(E) = 5(pi„ i0ut ) = S(pi n ) + S(p out \pin) 



(59) 



If S(pi n ) is the entropy of the black hole radiation (to- 
gether with the stimulated radiation), then S(p ou t\Pin) 
is the conditional black hole entropy given the radiation 
field, a most interesting quantity in black hole physics. 



VII. SUMMARY 

Entropy and information are statistical quantities de- 
scribing an observer's capability to predict the outcome 
of the measurement of a physical system. Once couched 
in those terms, information theory can be examined in 
all physically relevant limits, such as quantum, rela- 
tivistic, and gravitational. Information theory is a non- 
equilibrium theory of statistical processes, and should be 
used under those circumstances (such as measurement, 
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non-equilibrium phase transitions, etc.) where an equi- 
librium approach is meaningless. Because an observer's 
capability to make predictions (quantified by entropy) is 
not a characteristic of the object the predictions apply to, 
it does not have to follow the same physical laws (such 
as reversibility) as that befitting the objects. Thus, the 
arrow of time implied by the loss of information under 
standard time-evolution is even less mysterious than the 
second law of thermodynamics, which is just a conse- 
quence of the former. 

In time, a fully relativistic theory of quantum infor- 
mation, defined on curved space-time, should allow us 
to tackle a number of problems in cosmology and other 
areas that have as yet resisted a consistent treatment. 
These developments, I have no doubt, would make Shan- 



non proud. 
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